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Orthogonal Regression M-Estimators 

Ruben H. Zamar* 
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ABSTRACT 

M-estimators in the orthogonal regression set-up are defined 
in order to produce a robust alternative to the method of classical 
orthogonal regression. Asymptotic qualitative robustness of some 
orthogonal M-estimators is proved. The influence curve of these 
estimators is computed and some additional asymptotic theory is 
presented. 
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1. Introduction 


The error-in-variables model and the method of orthogonal regression have 
received a great deal of attention in the statistics literature (see, for example, 
Anderson (1984) and Moran (1971) for extensive reviews and references). The 
method of orthogonal regression enjoys the following properties: 

(PI) It is the ML procedure at the Gaussian error-in-variables model. 

(P2) It gives a symmetric treatment to all the variables under study. 

(P3) It allows for additional information about the variances of the obser¬ 
vational errors (which might be available) to be used in order to 
obtain consistent estimators of the regression coefficients. 

These are features which may render orthogonal regression more or less attrac¬ 
tive, depending on the situation at hand. 

A major disadvantage of orthogonal regression is that, like ordinary least 
squares regression, it lacks robustness: it is not resistant to outliers (see 
Brown, 1982), it lacks efficiency robustness its influence curve is unbounded 
(see Kelly. 1984), and its breakdown point is 1/n (see Hampel, 1971; Huber, 
1981). 

This paper studies a class of robust alternatives to orthogonal regression 
estimates, namely orthogonal regression M-estimates (ORM's). ORM-estimators 
have the property that they are MLE's for appropriate non-Caussian error-in- 
variables models. They also enjoy properties (P2) and (P3). 

In section 2, ORM-estimators are defined in the context of both symmetric 
and asymmetric models, the latter being the usual regression model with 
response and carriers. The existence and uniqueness of ORM-estimators is 










established, under suitable assumptions, in section 3. Section 4 is devoted to 
ORM-estimator influence curves and some asymptotic bias calculations. Finally, 
section 5 treats the following asymptotic properties of ORM-estimators for both 
the structural and functional models: consistency, asymptotic normality and 
qualitative robustness. 
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2. Orthogonal Regression M-Estixnators 


Given a unit vector a in RP +1 and a real number b , the hyperplane H (a.6) is 
defined as 

H = H(a,6) = |ieR^:a'i=6) . (2.1) 

Let {i|. .... *n\ be a set of n data points in FP* 1 . The method of orthogo¬ 
nal regression (OR) consists of solving the following minimization problem: 

where 


d(xH) = |a'*-6 | = min||x-yi| (2.3) 

JC/f 

is the perpendicular (Euclidean) distance between x = (x 0 ,x x , . . . , x p )’ and H. 

This view of orthogonal regression highlights the symmetry of the method, 
that is, no particular coordinate Xi of x is regarded as a response, with the 
remaining coordinates of x being predictors. However, orthogonal regression is 
often treated from the viewpoint of a linear model with coefficients 
. 0 P corresponding to an intercept and p predictor (or carrier) vari¬ 
ables it. . . . , Xp. Taking xo as the response, our symmetric form of orthogonal 
regression (2.2)-(2.3) can be cast in terms of the linear model coefficients as fol¬ 
lows. Let 


Then 


a = 


>•01 . 0 p ) • & — 


i + 




1 - 1/2 


<=l 


» = <5(1-0..-0,). b = <50o 


d 2 (x.H) = <5 8 (* o -0o-0i*i- • • -PpXp) 2 


(2.4) 


(2.5) 










and solving the minimization problem 


min, £ [<5(*ot-0o-0i*i* 

ffeSF * 1 i.i 


• ft,**)] 2 


is almost equivalent to solving (2.2). In fact the two solutions will be equivalent 
when (2.2) yields a 0 * 0. For if Sq * 0. we can express a solution 


a = (a 0 , .... ft,). ||aj| = 1 in the form (2.4) by letting ft = - ft/a 0 , i - l ■ ■ p 
and 6 = (1 + 2ft 2 ) -1/2 ' Note that a solution to (2.2) with a 0 = 0 corresponds to 


ft = oo in the case p = 1. and to a hyperplane which is parallel to the x 0 direction 


in the general case. Of course data sets producing an estimate a with oq = 0 will 


have probability zero under any "reasonable” distribution model for 


As is well known, problem (2.6) yields the Gaussian MLE for the following 


error-in-variables (EV) model 


X 0 = (} 0 + ftJfj +••••+ PpX p 


x = X+e 


(2.7') 


where ^ — (*o»zri* • • • . )■ If — (X q, X j. . « • , ft), and £ — (^o* * » • . ) is 

^(O.o 2 !). When we take n observations on the above model we write 


= X< + ft 


where Jft = (*<x.*«.**)• X* = (*oi.*K. X#), and 

ft = ( e «.ftt. ■ • • . «p*) for i = 1, . . . , n. If X t .X* are random iid vectors 


we have a structural error-in-variables model. On the other hand, if X,..... X„ 


are fixed (non-random) but unknown incidental parameters we have a func¬ 


tional error-in-variables model. 


If ao is a unit vector in JR P ' M and bo € R*. the symmetric formulation of the 


model (2.7)-(2.7') is 


•oX = 6c 

























- 6 - 


In the Linear regression setup (2.7) mo = mo = (/So. 0.0). and in the sym¬ 

metric setup (2.9) mo = m£ = boao- 

Further perusal of the linear regression formulation of the EV model leads 
to a nice geometric motivation for the definition of the orthogonal regression M- 
estimate (ORM). First one verifies that 


bi = <5(/3i, 1,0.0) 

bj, = <5(08.0,1.0) 

b* = <5(0 p ,0.0_ 1) 


( 2 . 10 ) 


is a (non-orthogonal) basis for V. Since ao is orthogonal to V any vector r € 1RP +1 
can be written as 


x = mj + z*bk + eao V z\, . . . , z 9 € R' (2.11) 

*=i 

where J 2 t bfc € V, *q+ J z k h k is the component of x which lies in the noise-free 
t=i *»t 

model (2.7). and eao is orthogonal to the noise-free model (2.7). The simple case 
p = 1 and 0o = 0 is illustrated in Figure 2.2 below. 

The coefficients z t . z v are easily determined by taking t'ie inner pro¬ 

duct of Ct - (0,0, . . . , 1, • • • 0)', with both sides of (2.11), where the “1” is in 
the k + 1st position of c*. This gives 

z k = <5 -t (z* + <50* e) . (2.12) 

Finally e is determined by premultiplying both sides of (2.11) by a', which gives 

e = <5(z o -0o- £ 0***) • (2-13) 

*=l 


It is straightforward to check that 



a 

90 * 


e = -6 2 z k 


(* = 1. p) ■ (2.14) 


Then, since the minimization problem (2.6) can be expressed in terms of the 


















FICL'RE 2.2 
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(a) 

(b) 


~ E e i z * ~ 0 

71 i = 1 
1 n 

i s °< = o ■ 


(2.16) 


i* 1 


This leads to a natural definition of an orthogonal regression M-estimator: 

zr 2 p(ei) = min! (2.17) 


n 


i=l 


where p is a symmetric robustifying function. Differentiation with respect to p 
gives the following ORM estimating equations: 


(2.18) 


(a) 

— E ^( e t)zw = o 

n i=l 

(b) 

- 2 *(•*> = 0 
71 i»l 


where = P'■ 


MLE MOTIVATION 

There is also a maximum likelihood rationale for the above definition of 
ORM-estimators. Let 


Vi = Po + 

Zi = Xi + Hi 


(2.19) 


be the usual error-in-variable model with all the classical assumptions in force 
except that the joint distribution of the error (u,v) is not Gaussian but of the 
form 


f(u,v) = Ke ~>**■♦**> . (2.20) 

The log-likelihood function for this model is 

f(0o.0i.*i. x n ) = 7ilog(/0 - 2 p[(Vi -pn-PxXif + (Xi -Xi) 2 ] (2.21) 
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DifTerentiation with respect to all the parameters (structural and incidental) 
gives 

SI 1 n 

^7 = 0 => + = o ( 2 . 22 a) 

J^ = 0 => ^ SV[(3A-ft-/JA) ! + fe-Jri) 2 ]( 3 A-ft-/?A) = 0 ( 2 . 22 b) 

= 0 => H(vi-Po-PiXi) 2 + (*i-Jfi) 2 ][(yi-/?o-/?iAi)/? 1 + (Xi-Xi)] = 0 . 

(i = 1 .n) . ( 2 . 22 c) 


Equations (2.22c) are all satisfied if we set 

*< + Pi(.Vi~Po) 


= 


1 + ff 


(i = 1 , . . . , n) . 


(2.23) 


Inserting (2.23) into (2.22a) and (2.22b) we get the following MLE estimating 
equations: 

*i + Pl(l/<-0o) 




yi-Po-P\Xj 

(i +j g?)^ 


1 n fs, 

iz* 


Vi-Po-PiXi 


= 0 


= 0 


(2.24) 


where if/(t ) - But equations (2.24) are also the estimating equation of the 

following (ORM-estimator) optimization problem: 


1 

min — V p 
Mi n u * 


yi-Po-PiX j 


(2.25) 


where p(f) = )£p(f a ) and V'O) = p’(f) = z )t • This carries over to the general p 
case, as shown in the Appendix. 









3. Existence and Uniqueness of ORM-Estimators 


We now show that given a sequence xj, . . . , x„ of observations in R p+1 the 
minimization problem 


min — 2 p(a’x< - b) 


has at least one solution. In general the solution to (3.1) is not unique but we 
will show that appropriate side constraints can be imposed in order to force 
uniqueness. On the other hand we'll see later on that it is not necessary to force 
uniqueness in the context of asymptotics since all solutions to (3.1) will enjoy 
the same asymptotic behavior. We consider two cases. 


MONOTONE V* 

For each unit vector a in RP let 

B(a) = b : min ~ £ P(“i -b) = ~E p(»'*< -&) (32) 

n ti 

Monotonicity of ifr=p' implies that p(f) -» +» as t -» ±«>. and therefore it follows 
as in Huber (1964) that B(a) is a non-empty, bounded and closed interval. 

Lemma 3.1: Let a,c denote unit vectors in RP* 1 . For each e > 0 there exists 6 >0 
such that if ||a-e|| < 6 and b x € B(a) then there exists e B(c) with 
16 1 — 6g| < c. 

Proof: See the Appendix. 











Now we select a particular element of B(a), namely 


F (a) = min B(a) . 


(3.3) 


Lemma 3.2: F( a) is a uniformly continuous function of a. 

Proof: See the Appendix. 

Lemma 3.3: Given Xt. . . . . x^, for all ||aj| = 1 and b € R 

Ep(a'*- 6 ) § ^p(a'i(-F(a)) . ( 3 . 4 ) 

Proof: If 6 0 = F( a) and b e R then 

2 p(aX-b) = SpO'x* - 6 0 + (bo- 6 )] = 

= £p(a , ii-bo) + (6o-b)2v'[a , i< -6 0 + at(6o-6)] (3.5) 

where 0 £ a* £ 1 for i = 1 .n. Since ff(s+0 £ ti//(s) for all s.t € Rwe have 

£p(a'Xi-&) g Sp(a'x t - 6 0 ) + ( 6 o“ 6 ) E ^(a'x- 6 0 ) 

* Ep(a’x*-&o) ■ (3.6) 

The above Lemmas yield the following: 

Proposition 3.1: The minimization problem (3.1) has at least one solution. 

Proof: Since the right hand side of (3.4) is a continuous function of a it achieves 
its minimum at some vector a, say. If 6 = F( a) then (5,6) is a solution to (3.4). 









»atr 


We note that when p is bounded the ^'function in the ORM estimating equa¬ 


tion is of the redescending type. It will be shown in the next section that under 


certain conditions an ORM-estimator with a bounded loss function p is robust in 


a precise technical sense. 


position 3.2: Let p be a non-negative, even function which is non-decreasing 


on [0,o») and such that 


0 < lim p(f) = AT < oo 

HI — 


Then (3.1) has at least one solution. 


Proof: Let 0 < <5 < M/n and t o> 0 such that 


\t | > f 0 => p(0 > M - 6 . 


Let K = max |N||. If ]6 | > t 0 +K then 

ISiSn 


| a'** - f> | S | b | - | a'x< | £ 16 | - K > t 0 . 


Consequently if 16 | > f 0 + K, for all j|aj| = 1 and all i = 1, . . . , n 


p(a’xj - 6) > M - 6 


(3.10) 


~ p( a'X( - 6) > M - 6 . 


(3.11) 


Since — £p(aX”6 ) is a continuous function of (a.6) it achieves its 


minimum on the compact set 


A = ((•.&): INI = 1 • I&I S<o + *J 


(3.12) 


Let c be such that c'xi = 0 and ||c|| = 1. Now (c,0) £ A so that by definition of 5 




. »%_*« * R • , 

r ■Vj r V 


>•> v v > *y-,yi*y 




■ - 







min — £ p(aX - &) ^ j- 2 p(c'*) * — 1 M < M - 6 

(a.6 )cA 71 w n i=t n 

S inf — yip(a , * i -6) . (3.12’) 

Therefore -Ap(a'ij _ 6) achieves its minimum when (a.6) ranges on 

71 

{(a.6): ||aj| = 1, b e Rj and the minimum can be achieved only at elements of A. 


FORCING UNIQUENESS 


So far there is no assurance of uniqueness of solutions in either of the two 
cases considered. However, when the solution set is not a singleton, it is possi¬ 
ble to select a particular solution 0 n = (£*,&„) from the (compact) solution set 
in a systematic way. as we now show. Let 


B = 


(a,6): — f'pia.'Xi-b) = min — f]p(c'x i -d) 
v • / „ ^ ** ' Heiisi. ieR n ^ 


n 




(3.13) 


Among all the minimizing pairs (a,6) we take only those with the smallest possi¬ 
ble non-negative b -component to form the set Bo, that is 

6 = inf (6 : (a.6) eBj_ 

Bo = {(«.&)€ B: b = b j * 14; 

Now let |ui, . . . . Ufe 4 .il be the canonical basis of K P ' M and for k = 1.p + 1 let 


8 * 


(a.6) € Bfe_i : ||a-u*|| S ||c-Ufe|| 


V (c,6 ) eB*_i 


(3.15) 


That is, among all the pairs (c,6) in Bfe_t we take only those which are nearest to 
Ufe to form the set B*. Finally the following lemma provides us with a uniquely 
defined element of B which we shall label as 9 n . 









Lemma 3.4: Bp+j defined above contains exactly one element. 


Proof: Assume that (a.6 ) and (c ,b ) are in B„ +1 . Then ||a|| = ||c|| = 1 and 


ll»—Ufcll* = l|c-u*|| 2 . k = 1. p +1 . 


(3.16) 


But if a - (aj, .... Op+i)' and c = (c ..c p+ i) then 


||a-u*|| 2 = (a* - l) 2 + £ a? = 1-20*+ a / = 2(l-a*) (3.17) 

>*•* f=i 


and the lemma follows since 2(1- a*) s ||a-u*|| 2 = ||c-u*|| 2 = 2 (l-c t ) implies 


that a= c. 








< 

-'.V. 

1 V V V " 


% %- 
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4. Influence Curve and Asymptotic Bias of ORM-Estimators 

4.1 INFLUENCE CURVE 

Hampel’s (1974) influence curve is one of the most useful heuristic tools in 
robustness (cf. Huber, 1981). The study of its shape enables one to visualize the 
way a small fraction of contaminated data can modify the asymptotic behavior 
of a certain estimator. In addition it yields the asymptotic variance of the esti¬ 
mator. In this section we compute the influence curve of the ORM-estimator and 
study its behavior ir, more detail for some particular models. 

In order to define the influence curve of the ORM-estimator we assume a 
nominally Gaussian distribution for the EV model (2.7)-(2.7‘). That is, we assume 
that 

* = X+ t (4.1) 

where X and e are independent, X ~ N p +\(jiX), t ~ N p +x(0 , a 2 !), and the random 
vectorX= ( X 0 ,Xx , . . . , X p )' a.s. satisfies the linear model 

X 0 = ft + P\X x + • • + P p X p . (4.2) 

Thus F = N(llX + 0 2 !) is the distribution function of x. Now let be a point 
mass located at £ € FP +l , and for 0 < c < 1, let 

0 t . t = (l-t)F + ed t . (4.3) 

The influence curve of 3 = (3o- • • • • Pp) is defined as 

ICft-) = IC (t;p.F) = lim (4.4) 

€•♦0 E 

provided the limit exists. Here the estimator 0 is regarded as a R p ' f '-valued 
functional defined on some subset of distribution functions. 




6 - 


We compute the influence curve of the ORM-estimator in the usual way. 
Namely, we differentiate the population (or asymptotic) estimating equations of 
the M-estimator under C ti ( with respect to e, and evaluate the derivative at t = 0. 
This produces a system of linear equations which determine IC(f). 

The (population) estimating equations of the ORM-estimator under C tif are 
A k (t) = (i~e)E[i>(e)z k \ + ci/[e((,P)]z k ((,P) = 0 (4.5) 


for fc = 1. p and 

j4 0 (e) = (l-e)£flKe)i + = 0 . 


(4.6) 


Here z k = z k ((;,p). fc » 1, . . . ,p and e = are defined on (2.12) and 

(2.13) . Also p - p[(l-z)F + z6f] is the asymptotic value of this parameter under 
G t- (. In addition Eft/(e )z k j and E\l/(e)\ stand for the more explicit notation 
ErH[e{x.p)]z(x,p)\ and £fiV[e(x.0)]J, respectively. 

Differentiation of (4.5) and (4.6) at e = 0 can be carried out with the help of 

(2.14) and the following easily-derived relations: 



-6p f 

e(l-<5 2 /?/) 

< 5 2 ( 0 * Zj-p } z k -p k Pje) 


k = 0 
k =j 

k * j , fc ft o 


(4.7) 


forj = 1. p. The resulting linear equations are 

M/C(f;0) = 7 (t,p) 


(4.8) 


where 


IC(f,0) = (fC 0 ((,fi) ./c P «•.?))’ 


7«.P) = ^[e«.P)](hzi«.P) . Zptt'P))' 


and M is a 0> + l)x(p + l) matrix with elements 


rriQj = 


6EW(*)\ 

5*Eme)zj} 



(4.9) 


(4.10) 











m ■ u ■ u i L » m j i j *um mry up m . 1 ■ J. *>■ -.-■vvv/j- jvj'j-s, //. <•_■ -r . 


- 17- 


and 


7n*^ 


j ; = o 

« a £ ty‘(e )** a J - E Me )e K1 - 5 Z fl?) J = * 
6 z Ett>'(e)z k Zj\ + 6 Z E\if/(e )e j * fc.O 


(4.11) 


for k = 1, . . . , p. 

The entries of M are in general rather complicated, which makes it difficult 
to give an explicit expression for M -1 (and hence IC (f.0)) in the general case. 
However, this is easy to do for the simple case where p = 1 and = 0. In this 

case e and z i are independent normal variables with zero means, 
moi = m t o = 0, and one can check that (1 -5 a /S?) = 6 Z = (1 +/Si) -1 . Therefore, 

rCM.fi) = ♦[•«*)] (4.12) 

and 

■ w&H *»■»>.».» ■ <«•»> 

Here the intercept component /C(f,/?o) is independent of Zj and is bounded, 
while the slope component IC (C,(5i) is unbounded. 


REMARK: OR corresponds to the particular case ifr(f) = t and 1/'(t) = 1 (cf. Kelly, 
1984, eqs. (2.6) and (2.7)). 

It is clear from (4.8) and (4.9) that the elements of IC (£,/?) are in general 
unbounded. Nevertheless if ifr is fully redescending (e.g., if i> is Tukey*s bisquare 
function), then IC ((,0) is unbounded in a nice way: it can be arbitrarily large 
only when e({,0), the departure of { from the true error-free model, is small. 
(This is also the behavior of an ordinary M-estimate using redescending for the 
classical linear model Setup.) 













To illustrate this consider the simple case p = 1 and /? 0 = /S, = 0, and with 
fully redescending. For each Ci ^ 0 and a > 0 let 

4.(0 = KCl.fc): <\ = <> \<z/<\ SaJ 

For a = 0 let 

4.(0) = {(0, f 2 ): -«<f<oo{ 

4a = U 4a(C) 

and 


Notice that points in A£ = IR 8 - A*, are “nice" (for small a) in the sense that they 
belong to lines through the origin with slope less than a (see Figure 4.1 below). 



FICUJtE 4.1 


In Figure 4.2 below are presented plots of g a (ti) versus for various values of a. 
It shows that the maximum influence of points outside An is finite for all a >0 and 
that this uniformly decreases as a gets larger. 






















where cr 00 and m 00 are of order lxl, and £ n and M n are of order p xp. Then we 
have moo = )j, M 10 = M' 10 = 0. and 


M n = E\-p(e)]E u + (a z -E\i/(e)e])J p . ( 4 . 16 ) 

If E[i/'(e)\ £ 0 and (fi — Etyie )e j > 0 (as is generally the case), then Mn is posi¬ 
tive definite and invertible. 





\\+:- 


mmkwm 














5. Asymptotic Theory 


In this section the structural and functional error*in-variables models intro¬ 
duced in section 2 ((2.9) and (2.9')) are considered in order to establish some 
asymptotic properties of the ORM-estimator. 

It is well known (Kendall and Stuart, 1979) that when the covariance matrix 
of £ is known (up to a constant factor), the classical orthogonal regression esti¬ 
mator is consistent under both the structural and the functional EV models. 

Subsection 5.1 is devoted to the study of the asymptotic properties of ORM- 
estimators under the structural model. Consistency (Theorem 5.1), asymptotic 
normality (Theorem 5.2), and qualitative robustness (Theorem 5.3) are esta¬ 
blished. The asymptotic bias of a particular ORM-estimator under some contam¬ 
inated Gaussian EV models is numerically computed and presented in Table 5.1. 
Subsection 5.2 is concerned with the asymptotic properties of ORM-estimators 
under the functional EV model. Consistency (Theorem 5.4) and asymptotic nor¬ 
mality (subsection 5.2.2) are treated here. 

5.1 ITTRUCTURAL CASE 

5.1.1 Consistency 

First we show (Theorem 5.1) that under fairly weak assumptions, an ORM- 
estimate converges to its asymptotic (population) value. Then, Corollary 5.1 
establishes the consistency of an ORM-estimator (convergence to the true 
parameter in the EV-model) under additional distributional assumptions, includ¬ 
ing that Ac is spherically symmetric for some known, non-singular matrix A. 







Theorem, 5 1: Assume that the loss function p is continuous, non-negative, and 
non-decreasing on [0,®). Suppose that there exists a unit vector a t and a real 
number b\ such that 


(i) £>fp(a',x-&,)j < limp(f). 

t 

(ii) (ai.^i) (strictly up to the sign) minimizes Ef\p(a!x-b)\ among all 
unit vectors a and real numbers 6. 


If = (a n ,h n ) satisfies 

zr 2 p(S n 'x i -b n ) - inf £ p(a'x*-6) -♦ 0 a.s. (5.1) 

Tl l= j ||ajpl,6€K 71 ^_j 

then 

®n-*(a t .6i) a.s. [F] 


Proof : The theorem follows from Theorem 1 of Huber, 1967 after we show that 
almost surely ultimately stays in a compact set. 

To show that 9 n almost surely ultimately stays in a compact set we argue 
as follows, lly hypothesis, there exists T >0, M >0, <5 > 0 and e > 0 such that 

(a) p(t)*M V \t | g T 

and 

(b) Af(l-<5) > £>|p(a’ix-6i)J + e. 

Let /f j > 0 be such that 

/V(NI ^ Ki) * 1 - <5/2 

and let K > 0 be such that | b | > K and llx|| £• K x implies that 





























I a'*-6 | £ T V ||aj| = 1. 

Now, by tae strong law of large numbers we have that for all jjaj| = 1 and j b j > K 


^ ^ 2 p(a'*i - b)I (||*j|| 

71 i=i 71 i=i 


>£>(p(a'i*-6i)j + e £ — £>(a’i*<-6i) + t /\2 


for suffic ently large n, a.s. [J 7 ]. Therefore, for sufficiently large n, |6 n | 
and B n belongs to the compact set 

l-K,K] x {a: |ja|| = lj a 


Ccrolla.ru 5.1: Assume that 


The loss function p is continuous, non-negative, and non-decreasing 
on [O.oo). 

(a) The distribution of e is spherically symmetric. 

(b) The density / of Y = a'E is unimodal and continuous. 

E\p(a!c)] < *>. 

There exists <5 > 0 such that p and / are strictly monotone on [0,(5). 


If 9 n satisfies (5.1) then 


0 n -» Oo = (ao.&o) a.s. [Z 1 ] 


The proof of Corollary 5.1 uses the following lemma. 


■ iriiCAn o « • . - J/ n 1 ^ • ' . n « • - ■ . i - ’ o • ‘ -.l• ' 


- ■ • * «L* « - - * • • • 
J» • » *■ • * ■ 

■'V On 1 O O ■ 









Lemma 5.1: Let, for each t £ IR 

g(t) = E\p(Y -t) - p(Y)\ (5.3) 

where Y is as defined in (ii.b) above. Then 

g(t) > 0 V t * 0 ■ (5.3') 

Proof: See the Appendix. 

Proof of Corollary 5.1: The corollary follows from Theorem 5.1 once we show that 
our estimator is Fisher consistent over the distributions specified by (ii) and 
(iv), namely that 

Ef\p(».'x~b) - p(a'o*-b 0 )] i 0 V INI si, 6eR (5.4) 

with equality iff (a, b) = 0 O or (a,6) = - Oq. 

Since (5.3) trivially holds when E\p(x'x-b)\ = » we only need to consider 
the case when p(ax- b ) is integrable. In this case by (ii) and (iii) and Lemma 5.1 

E\p(*x-b) - p(ao’x-6 0 )i = Efp(m'x-b) - p( a’e)j = 

= / \f[p(y ~ (b-x'z)) - p(y)]dF r (y) dF x (z) = 

= / g(b - m'z)dFj(x) Z 0. (5.5) 

Finally, if the last term of (5.5) is equal to 0 then g{b - a'X) = 0 a.s., which 
implies that a'X - b = 0 a.s. and 0 - Bq or d - - 0 q. ■ 

The assumption that A£ is spherically symmetric is weaker than the 
assumption that eo and ei are independent, with \ = Var(eo)/Var(ei) known, in 
the sense that the former does not require the existence of second moments. 



On the other hand, the assumption that A £ is spherically symmetric is a 
stronger assumption in the sense that it ensures the consistency result of Corol¬ 
lary 5.1. We have not been able to obtain this result with the ‘‘A known” assump- 


Suppose that co and £1 are independent with distribution F and A= 1. Then 
the distribution of £ is not spherically symmetric unless Co and ei are normal, as 
can be easily shown following the style of the proof of Theorem 4.6.4 in Chung 
(1974). If in this situation F is non-normal, then it appears that an ORM- 
estimator will not in general be consistent, while the classical OR-estimator will 
be consistent. However, numerical computations of the asymptotic bias under 
various non-spherically symmetric contaminated normal models (see Table 5.1 
below) reveals, at least for the particular ORM-estimator under consideration, 
that (i) for A * 1 the asymptotic bias is much smaller than that of the 0R- 
estimate, and (ii) when A = 1 (rf = t|) the computed values of the ORM are so 
close to zero that one suspects they are exactly equal to zero. However, we have 
not yet been able to prove this for any non-sphericaLy symmetric distribution. 
This point may deserve further attention. 

In summary, the fact that an ORM-estimator may be asymptotically slightly 
biased in situations under which the OR-estimator is consistent is not so serious 
an objection as it might at first appear to be, since from the robustness point of 
view this small bias may be regarded as an "insurance fee” we are willing to pay 
to achieve a high degree of bias robustness and thereby avoid the catastrophic 
behavior of the OR-estimator in the presence of outliers. 






V\vv'>V»v- , r -'• '■ 













Asymptotic Bias tinder Contaminated Gaussian EV Models 

As we have seen above, ORM-estimators are asymptotically unbiased under 
a spherically symmetric E-V model. We may well ask what happens when the dis¬ 
tribution of the error term of the E-V model is a non-spherically-symmetric 
outlier generating distribution. As an example we consider the very simple EV- 
contaminated model: p = 1, 01 = 1. 0o = O. Ao ~ iV(0,4). and e = e + (5iAi,5 2 A 2 ). 
where Xq, B\, Bz, Ai, and Ag are independent, Bj ~ Binomial ( 1 , 7 ) and 
Aj ~N(0,Tj), (j = 1,2). In this case the classical orthogonal regression estima¬ 
tor, as well as the ORM-estimator, is asymptotically biased. Meanwhile, as illus¬ 
trated in Table 5 .1 below, the asymptotic bias of the ORM-estimator can be radi¬ 
cally smaller than the asymptotic bias of the classical OR estimator. This speaks 
well for the bias robustness properties of the ORM-estimator. The particular 
ORM-estimator considered here is the one with (Tukey’s) loss function 



for | x i £ c 
for |*| gc 


(5.5’) 


with c =4.7. 


Of course further study is called for, and we intend to carry out analogous 
computations for higher order models, a richer class of contaminating distribu¬ 
tion, and also some Monte Carlo simulations to obtain finite sample bias. 







5.1.2 Asymptotic Normality 


This subsection is devoted to the proof of the asymptotic normality of ORM 
estimators. The assumptions under which asymptotic normality is proven are: 

(1) (Sn.bn) = ff n -* 0 O = (ao.bo) a.s. asn-»». 

(2) iff is symmetric, bounded and ) \/ t € R 
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(3) Ety(a! 0 x-b 0 )xl = E^x'qx- i> 0 )a' 0 xj • a<,. 

(4) oc)a'ocl > 0. 

(5) £{IWI 3 } < ». 

NOTE: Assumption (3) holds whenever (ao.i>o) minimizes £ , jp(a‘x-6)j among 
unitary vectors a in ]RP +1 and b in R for the Lagrangian of this optimization 
problem is 

L(m.b.\) = E\p(*’x-b)\ - j \(a’a- 1) (5.6) 

and 

~ L - E\i>(h'x- b)x\ - Aa = 0 (5.7) 

da 

L - = o . (5.8) 

Premultiplying (5.6) by a' and using a'a = 1 we get 

A = Ety(m'x- b)*'x] . (5.9) 


To begin we show that QRM-estimators satisfy an equation which has a non- 
singular derivative. 

First we consider models without an intercept term. Recall that in this case 
in minimizes 







to get the defining equation 


(/-SnS'n) ~ ^(a'nXi)yi = 0 . 
•I »=‘ 


(5.12) 


1=1 


REMARK: In fact 


_3_ 

da 


a* 


INI 


I - 


aa' 


INI 2 


71 




I all 


(5.13) 


but we may choose our estimate to have length one, so that (5.12) holds true. 


Let 


and 


iK*:a) = (I-aa'Ma’Xi)* , VaeF* 


A(a) = ^^(^a)} Vael? 


Lemma 5.3: Let 


D A(a) = 


d \ , , 
357 (,) 


»*} P 

;sl j> 


Then D A(ao) is a non-singular matrix. 


Proof: It can be easily verified that 

Z)A(a) = £1(1-aa')x*'V , '(a'x)j - JFfiKa'*)[(■’x)I+ ax’]} 
and using assumption (3) we get 

■0A(a<j) = (I-aoa'o)E - a 0 (I+aoa’o) 


(5.14) 


(5.15) 


(5.16) 


(5.17) 


(5.18) 


where 












£ = Ety'(a: o^xx'j = E [■f(*' 0 e)cE , \ + £ jV'' (■'<>« 


(5.19) 


and 

a 0 = E\ip( a’ojija'oxj = Efol>(a' 0 e)a' 0 c) ■ (5.20) 


Let a € HP. We will show that [.ZJA(ao)]a = 0 => a = 0. 

Case 1: a = aao for some a e R 

0 = a'[Z?A(ao)] = aa'o(I-aoa'o)£ - aa 0 a’o(I + aoao) 

= -2aaoao = 0 => a = 0 (5.21) 

since ao = £’^(aoe)a'oe| ** 0 by assumption (3). 

Case 2: a * aao for any a € R 

0 = a'[^A(ao)] = a(I-aoa’ 0 )E ~ a 0 a’(I+aoa'o) (5.22) 

This implies that 

0 = a , [/JA(a 0 )3(I-aoa , 0 )a = a'(I-aoa'o)(E-a 0 I)(I-aoa'o)a (5.23) 

because (I- ao*’o)(I+ aoa' 0 ) = (I- aoa'o) and (I- aoa’o) 2 = (I-aoao). We notice that 
if c = (I-aoa'o)a then c'ao - a'(I- aoa' 0 )ao = 0. 

The lemma follows after we prove the following claim. 

Claim: If c'ao = 0 then c'£ci< aoc'c unless c = 0. 

Proof: E\p( a'e)} is a continuous (in fact constant) function on the compact set 
C* = {a : a'a = k j. Because of Lagrange’s multipliers theorem there exist ao and 








Hence 


a’ 0 ao = k 


EM!e 0 e)cl = fa 


Ell/(a!e)c] = /Sa V» e C k 


In fact, if P is an orthogonal matrix then 


e = Pe . e = P'S? , e2e 


a = Pan . % = P’a 


0$o = EH(Sc'e)e) = E\f(S 0 P7)P^] = P'E ty(a!e)t] 


It follows that 


E\1/(a.’e)e\ = /SPa^ = 0a 


Eft/(a!e)c\ = j&(!a|)a VaeK 1 . 


Hence 


~E\i/(a!t)t] = [|S(a)a] = /8(a)/ +a ~ /8(a) 


On the other hand 


fa Ety(a'c)c} = E\f{ a’e)ec’j 


E = Elf (a'oc)cc'l + Ehf>'(a’oe)]E fix’} = 


a°I + ao — /3(ao) + Elf'(a' 0 elElxx'i 


by (5.30), (5.31) and recalling that ao = fi(ao). 
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Now suppose that 


a'Ea = a 0 a’a . 


(5.33) 


a'Ea = a 0 a*a + Ety(a! 0 e)\E\(a!x) z ] 


(5.34) 


and this is impossible unless a'X = 0 a.s. 


We now consider the general model. The estimators (a„.&„) satisfy the 


equations 


(I-Sni'n) 3- S V'(®’n»<-&n)*i = 0 
71 t=l 


- i -6„) = 0 


(5.35) 


n i=i 


or equivalently 


(I-inS'n) ^ £ ^(a’ n ii-b n )(ii-u) = 0 


- ~ ElKS’n^-fen) = 0 


(5.36) 


for any constant vector u and in particular for u = E jxj. 


(I-aa')V'(a'*-6)(x-u) 


(5.37) 


A(a,6) = £’|V f (*:a.6)j . 


(5.38) 


D \(ao,t>o) = 


(I-aoa'o)E- a 0 (I+aoa'o) (I-aoa'o)^ jV , (a , oe)(e+x-u)!| 


(I-aoa'o)W(a’ 0 e)(e+x)} oOJ 


(5.39) 


■i- 










where 


E = £ , ^'(a , 0 e)(*-u)(x-u) , J (5.40) 

and 

a 0 = • (5.41) 

We notice that by assumption (2) 

(I-a 0 a , o)£’jV , '(«’oe)(c + x-u)i = (I - aoa'o)^ [^’(a’oeje} + £ jV' , (a’ 0 e)j£'{x-uj] 

= 0 (5.42) 

Finally 

det[Z)X(a 0 l 6 0 )] = E det [(I-aoa’ 0 )E - a 0 (I+aoa' 0 )] * 0 . (5.43) 

Now we want to check the remaining conditions for asymptotic normality of 
Huber (1967). Let 

d - (a .b) and 0 O = (ao, 6 0 ) (5.44) 

Lemma 5.4: There exist <5 > 0 and <5o > 0 such that 

||A(0)|| £ all®— foil V |j*>0o|| <«5 0 . (5.45) 

Let 

Tf/(x\8) = (I-aa)f(a*x- 6 )x (5.46) 

and 


Ti(xj«,d) = sup \\y(x\ 0) - 1 >(t,9)\\ . 


(5.47) 









Proof: See the Appendix. 



Lemma 5.5: There exist > 0, fa > 0, and <5 0 > 0 such that 

E\u(*e,d)\ g fad for |j 0 - 0 O || + d S 6 0 (5.48) 

and 

E\u z (*Q,d)\ £ fad for ||0-0 o || + rf S <5 0 . (5.49) 


Proof: See the Appendix. 

Lemmas 5.3, 5.4 and 5.5 show that ail the conditions for the Corollary to 
Theorem 3 of Huber (1967) hold. In our case we have 

A = £(X(ao,6 0 )) (5.50) 

and 

C = covtTKxacbo)] (5.51) 

where V'( x : * 0 . 6 o) is defined in (5.37). Thus, we have established the following 
Theorem. 


Theorem 5.2: If assumptions (l)-( 5 ) hold then 

^n(0 n -e Q ) -> N (0,V) (5.52) 

where V = A" 1 C(A') _l . 
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5.1.3 Qualitative Robustness of the GRM-Estimator 

Theorem. 5.3: If p is bounded and non-decreasing on [0,») then the ORM- 
estimator is qualitatively robust at any distribution F for which £yfp(a'x-6)j 
has a unique minimum. 

Proof: To prove qualitative robustness of the ORM-estimator it is enough to show 
that this estimator is weakly continuous when viewed as a functional on the set 
of distribution functions (cf. Huber, 1981). Let F^ be a sequence of distribu¬ 
tion functions on R P ' M which converges weakly toward F. We will show that 
8 m - 6(F ( - m )) can be uniquely defined for mkN and that 8 m -* 8 0 - 8(F) as 
m -*First we notice that the family of functions 

(p(x'a-f)): }}a)) = 1 . ftelRj (5.53) 

is uniformly bounded and equicontinuous at ail x € R p+1 (i.e., V xeR J,+l . V 

e>0. there exists <S>0 with the property that jjx-y|| < 6 implies 
ip(a'x-6)-p(a'y-6)| < e, V Hall = 1 and V b € R). 

Therefore 

r m (a,b) = f p(a'x-b)dF { ' m \x) -* f p(a'x~b)dF(x) s r(a,6) (5.54:) 

W> + 1 RP + l 

uniformly on |jaj| = 1. 6 eR (see Billingsley, 1968, p. 17). 

Let 8 0 = (ao.&o) be the point at which r(a,f>) achieves its unique minimum 
T o - T ( a o<bo) and M = lim p(t). Hence 

i ■*« 

6 = M - r 0 > 0 . (5.55) 

Let N £ IN and K > 0 be such that for all m £ N, 1 b | 2 K, and ||aj| = 1 we have 

r m (a.b) 2 r(a,6)- | (5.56) 

and 
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r(a,6) 2 M - 4 (5.57) 

4 

Thus, for allm £ Af. \b\ >K, and lla'i = 1 

r m (a,6) 2 r(a.b) - ^ ? A/ - | = t 0 + |- (5.58) 

On the other hand, for mZN 

r m (ao,6o) S -Ha^o) + y = t 0 + . (5.59) 

4 4 

From (5.58) and (5.59) it follows that for ail mkN, r m (a,t>) achieves its 
minimum on the compact set |(a.fe) : a = 1 . j 6 | £ A - }. We can uniquely define 
& m = as in section 3. 

In order to prove convergence of 8 m to let e > 0 be fixed. A compactness 
argument similar to the one used by Huber, 196? in the proof of Theorem 1 
shows that there exists <5 0 > 0 such that 

;; 6 - 0 o i! = ||a-ao!!+ '6 — 6 q i > e => r(0) > t ( 9 0 ) + 6 0 . (5.60) 

Therefore, there exists N €lN such that for all m £Af 

||9-0 o ||>e => r m (8) > T(d 0 ) + &o - ^ = r 0 + ~ 5 0 (5.31) 

and the theorem follows because for all m > N 

T m (8 o) i t(0 o) + ^ . (5.62) 


5.2 FUNCTIONAL CASE 


5.2.1 Consistency 

Theorem. 5.4: In addition to (5.2), (i), (ii) and (iv) of Theorem 5.1 assume that 


tMifT i . ' 


*_■ * _ “ L"* v 


\ -\ v.V.\ 





-37- 


1 


1 




a 


(viii) lim p(f ) = M < °°. 
t -»«• 


(ix) For all e > 0, =1 . > 0 such that 


lim £ 1 - 

n ->*• Tl 


where 


n (0 = #/ n (0 = :||*||£ i { 


(x) For all 8 * 9 a , z\ 6 > 0 , e > 0 , and N € N such that 


tm(e) 


^ <5 . V niAf 


where 


n*(e) = # A»(e) = fi : |a%-6 | * «i . 


0» **p ®o 


Furthermore, almost sure convergence of 0 n to do holds provided that 


~ £ E\p(*'*i-b)) -► g(*,b) V ||a|| = 1, 6eR 


for some continuous function g. 


(5.63) 


(5.64) 


(5.65) 


(5.66) 


(5.67) 


(5.68) 












NOTE: Assumption (ix) is equivalent to requiring that the sequence of empirical 
distributions 

F«(0 = £ £ /[MJ (11*11) (5.69) 

71 <*i 

is tight. Assumption (x) essentially means that besides H(»o,bo). there is no 
other hyperplane containing most of the X*. A sufficient condition for (ix) is that 
the largest eigenvalue \pn of 

-Sto-XXJQ-Xy = S„ (5.70) 

is bounded, while a sufficient condition for (x) is that the smallest eigenvalue 

X,: n(K) 

“ E (Xj-(X)(Xi-X)/ (Hulls AO = Sn(K) (5.71) 

is bounded away from 0 for some KZ 0 and all n £ N > 0. 

The following lemmas are needed to prove Theorem 5.4. For proofs of these 
lemmas see the Appendix. 

Lemma 5.6: 

~ S [p(*'*< _ b ) ~ - £ 'lp( a '*t “5)j| -* 0 a.s. (5.72) 

uniformly on ||aj| = 1 and b. 

Now Lemmas 5.1 and 5.6 are used to prove the following lemma. 

Lemma 5.7: 0 n - (a n ,b n ) is a.s. ultimately in a compact set. 





Lemma 5. 8: The family of functions 


= /in(*,f>): fJn(a.b) = ^ £ £|p(a'x< - b )) 

is equicontinuous and pointwise bounded on 

C = [0:0= (a, 6 ).||a|| = 1.6eRi . 


(5.73) 


(5.74) 


Lemma 5.9: For all 0 € C, 0 * 0 O there exists A* > 0 and N e N such that 

s ^(0,,) + VnSAf. (5.75) 


Proof of Theorem 5.4: Lemma 5.6 is just a technicality needed, together with 
Lemma 5.1, to prove Lemma 5.7. The first part of the theorem is proved by 
showing that given a subsequence 0 n •, there exists a further subsequence 
such that 

0 n » -* 0o a.s. . (5.76) 

By the Arzela-Ascoli theorem, which can be applied to in view of Lemma 5.8, 
given a subsequence hn’(0) there exists a further subsequence hn"(0) and a con¬ 
tinuous function h(0) such that 

lim hn"(0) = h(0) (5.77) 

n"-»« 

uniformly on C. 

Now Lemmas 5.7 and 5.9 allow us to prove the theorem by following the 
same method used by Huber (1967) to prove Theorem 2. The almost sure part of 
the theorem follows in the same way except that now assumption (ix) makes the 
subsequence argument no longer needed. 
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5.2.2 Asymptotic Normality 

The proof of Theorem 5.2 was based on Huber (1967), which deals only with 
the case of i.Ld. observations. Under the functional model the observations 

xj. x„. are not i.Ld. because ^jxtj = H with the X* fixed and, in general, 

Xj *Xj for i * j. Hence adapting the proof of Theorem 5.2 to cover the func¬ 
tional case could be done, extending Huber’s 1967 to the independent, but not 
identically distributed, case. Such an extension is of interest in itself, but we 
won’t pursue the issue here. 

A Taylor expansion of the ORM estimating equation (2.18) gives (we only con¬ 
sider the no-intercept case here) 

0 = i'[ei(P)]1\(P) = f] V{ei(Po)]Zi(0o) 

+ rtedPom(fio)VM) 

{ + tf«i(A)^%(A) ^(p-Po) + R n (5.78) 

Hence, under enough regularity conditions on the sequence Xj X 2 , • • • , the dis¬ 
tribution of e and the score function if/, y/n^(p-Po) will be asymptotically 
equivalent to 

A" Hflo) i>[ e i (/*o)]A (flo) 

where 

A = Z(p 0 ) = P liml^S ^r^r[ei(P o m(Pa)Z\(Po)) 

71 1 + P^OPO 

♦££*•»<* MjjMPo) (5.79) 








where 


Vn"O-* 0 ) iV[0. A-'V^-i)*] 


(5.80) 


V = Cov(V{e(0 o )] Z(£q)) (5.81) 


REMARK (qualitative robustness of ORM-estimators under the functional model): 
Theorem 5.2 of Huber, 1981 only applies to estimators which depend on i.i.d. 
observations. Therefore our proof of qualitative robustness for the ORM- 
estimators under the structural model will not carry over in a straightforward 
way to the functional model. The basic difficulty is that under the functional 

model the distribution of the stochastic process x, . x*, , cannot be 

characterized simply by a marginal distribution. The distributions Fi of the x* 
have different locations Thus the overall measure /x for the process 

*i- *8.wih depend upon the locations X\ .Xi. ... , as well as on the 

common marginal distribution for the e*. Nevertheless our conjecture is that 
under appropriate regularity conditions, ORM-estimators with a bounded loss 
function are qualitatively robust. It should be possible to prove this via the 
approach used in the proof of Theorem 5.3, along with some reformulation of the 
asymptotic “estimating minimization problem” in the spirit of Theorem 5.4. 






Appendix 


WJI MOTIVATION (general p case): 

To extend the MLE motivation of Section 2 to the general p case it is enough 
to notice that the system of linear equations (on X j, . . . , X p ) 

(x 0 -Po~ PiXi~ PaX 2 - ‘ ‘ ' ~PpXp)Pi + (*i~^i) = 0 
(x Q - p 0 - PiX { - p 2 X 2 - ■ • — PpXp)p 2 + (x 2 — X 2 ) = U 

I ) 

1 

(xo — Po~ P\Xi ~ P 2 X 2 ~ ' ' ~ PpXp)Pp + (ip —X p ) — 0 
has the (unique) solution 

X k = x k + 6p k e = 6z k , k = 1, . . . ,p . 

Hence, for i = 1.n we have that 

( x 0i ~ Po~ Pf^H ~ ' ' ~ PpXpi) 2 + S ( x lci~Xki) Z — 

4*1 


and 


(yi-Po~P\X\i~ ’ ' ’ ~PpXpi) z = 5 z e z . 


Therefore, the ORM-est.imator with loss function p(t) = Wp(f) (of. Section 2) has 

the same estimating equation that is obtained by differentiating the log- 

I 

"j likelihood function 


i(/S.X t .Xnl *|. X*) = 

= n log K - p f(xw ~Po~P\Xu - • ~PpXpi) z + (x« - A«) Z 


4*1 
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Proof of Lemma 2.1: 

Let e > 0 and K = max ||*i||. Then for i = 1. . . . , n 

liiin 

lla’x*-c'Xill * !la-c|| • ||xj| 5 Al|a-c|| . (A.1) 

If ||a-c|| < t/K = 6, then for i = 1, . . . , to 


c'Xj-e 5 a'x< 5 cX + e . (A.2) 

Dy rnonolonioily of <p, it follows that for all 61 € B(a) 

£ V-tCXi — ( 61 +e)] 5 £ ^[a'xi-6,] = 0 S J V'tc’i* — (f>i~c)] (A.3) 

»=t i=l t=l 


and since 


t 


tpic'Xi-bz) = 0 , it follows that 6 j-eS 6 2 S 6 1 + e. 


Proof of Lemma 2.2: 

Let e>0. let <5 be defined as in the proof of Lemma 2.1, and let ||a-cl| < 6. 
We may assume without losing generality that F( a) £F( c). By Lemma 2.1, if 
i >n = ^(a) then there exists b e B(c) such that 0 5 ft - 6 n < e. and the lemma fol¬ 
lows since 

b 5 b 0 + e => F( c) i b 0 + e => 0 5 F(c) - F( a) 5 e . (A.4) 


Proof of Lemma 5.1: 

Let t > 0. By (i) and (iii-b) 
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g(t) = / [p(y-t)-p(y)]f(y)<*y - 

1 /2 " 

= / [fi(y-t)-p(y)]f(y)<*y + / [p(y-0-p(y)]/(y)<*y = 

-- 

= 7 0(-y)-p(*-y)]/(*-y)dy ~ 7 [p(y)-p(y-*)]/(y)<*y = 

C/Z i/Z 


= 7 &><*> 

t/Z 


-p(y-0][/(y-0-/(y)]dy 5 o . 


(A.5) 


Finally, 

[p(y)-p(y-0][/(y-0-/(y)]|„»t = [p(*)-p(o)][/(o)-/( 0] > o (a.6) 

and the claim follows by (iv) and continuity of p and /. 


Proof of Lemma 5.4: 

By definition of derivative we have that 

I|A(*)-A(0 o )-0MSo)(*-So)II = o(ll*-*oll) 


that is, 

||A(*)-.DA'(*o)(*-»o)ii = O(||*-0oll) • 
Given e > 0, =1 do > 0 such that | 0 - Oo I < 0o implies 

||A<*)-Z)A(*o)(®-0o)ll < *11*-Soil • 


(A.7) 


(A. 8) 


(A.9) 


Now 





|l!A(«)||-|U>A(*o)(*-*o)ll| * ||A(fl)- J DA(« 0 )(e-«o)ll * e||«-«bll 

*> s \\\(o)W-\\D\(9 0 )(e-e 0 )\\ 


= > ||A( 0 )|| * ||ZJA(«o)(«- 0 O )|| - e||«-tfoil * 6\\0-e o \\ 


(A 10) 


because 


||2)A(* 0 )(*- «o)ll = He- *o)'[0A(0o)] , [£A(«o)]{0- e 0 ) j 2 


= Ke-e o )'A(e-0 o )\ z = d A (0,e o ) 


(All) 


where A = [Z? A(^ 0 )]'[Z? X(tf 0 )] is a positive definite matrix and d A (0,0 o ) is a dis¬ 
tance on W 4-1 ; which is equivalent to the Euclidean distance on IRP +1 (that is, 
there exist 0 t , fa > 0 such that 0il|0- *oll * <U(e,O 0 ) * Me- ff 0 || ). We can 
choose e < hence 6 = 0 t - e > 0. 


Proof of Lemma 5.5: 

The lemma follows after we show that there exist Jf t > 0 and K z > 0 such 


that for all 0,0 


lhKx.*)-V'(*.?)ll S * l ||x|| 5! ||*-?|| 


lhK*«) - f(*.#)|| s Kg IN! . 


(A 12) 


(A 13) 


In fact by assumption (2) there exist K\ > 0 and K z > 0 such that for all 


fl.<2 € R 




(A 14) 


(A 15) 


S K z . 













ijjS 

m 


Let 0 = (a.6) and Q = (a ,b). Now 


|(a'*-6)- (Hx-6)| £ ||a-HI ||t|| + 16 - 6 I S (1+ ||xj|)||0-0|| (A. 16) 


so that 


||^(x.«)-^(x.tf)|| = i|[f(a'*-6)-V'(a , x-6)]x|| S/T,(l+||i||)* ||«|| -||6-2f|| (A.17) 


* |^(a'x-6)-f(a , *~6)| • ||x|| £ K z \ |x|j . (A. 18) 


Proof of Lemma 5.6: 


For each fixed (a,b) the lemma easily follows. Indeed, if 


Vi =p(*'x i -b)-E\p(*:x i -b)\ then E j = 0. (l/h) £E\y?/i*\ § 2Af z £i* < ». 
Hence Kolmsgorov’s theorem implies that (1 /n) J y t -♦ 0. a.s. as n ■*». 


Let e > 0 be fixed and let *2 0: 


^ £ [p(aX(-6) - £’|p(a'x i -6)|] = 


= ^•2[p(a'*«-6)-£’(p(.** t >6){][/(||X t !|>0 + /(lkill>0] 


+ i-£[p(»’**-6)-^b(» , »<-b)}]/(l|Xi||Sf)/(||e < ||Sf) . (A. 19) 


The first term on the right hand side of (A. 19) is bounded by 


23f[(l/n) £ /(HUH > t) + (1/n) £ / (Hell > t )]. By (ix) this can be done less than 


t/Z for some t >0 and all ni^. 


The second term on the right hand side of (A. 19) converges to 0 uniformly 
on (* = }(a,6): ||aj| = 1, Indeed, if (aj.di) and (ag,6 z ) are in C*. for 


i - 1,2, • • ■ , we have 








> 


•ii 
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[p(*'i*i ~b\) - £’ip(a’ix i -6i)i - p(a’ 2 x i -6 8 ) + 

+ E\p( a'a* - 6«)n/(|R|| SO/(ll«ill S 0 

^ ^(a'jxt-fe,) -p(a , rt-6 a )|/(||H||S0/(H*<l|St) + 

+ £ , Hp(a',*i-6 1 )-p(a- 2 * t -6 2 )|/(||X i ||iOj • (A. 20 ) 

The first term on the right hand side of (A.20) can be done less than e/4 for all 
i€]N and all ||ai-a 2 ||+ 16 * — ba| < <5 for some $>0. The same thing is true with 
respect to the second term on the right hand side of (A.20). (Divide the domain 
of integration into two parts, one of them being the set \e: ||e|| £ foj. for some t 0 
large enough, and notice that for ijejj £ to and ||X*|| £ t . p(a’** -6) is a uniformly 
continuous function of (a,6).) Now use pointwise convergence of the second term 
on the right hand side of (A. 19), compactness of C k , and the just-proved con¬ 
tinuity of p( a'n - b ) - E )p(aX - b ) ) uniformly onieK and (a.6) e C* to get the 
desired result. 

Finally, K > 0 can be chosen so large that for all 16 | > K , ||aj| = 1 , and i £ IN 
Af - | g p(a'**-6)/(||e i ||St)/(||X t [|5t) S if (A.21) 

and 

M - — i E\p(a!%i -6)/(||Xi||S<)J S if . (A.22) 

Hence, the second term on the right hand side of (A. 19) can be done less than 
e/2 for all n ZN. jja|| = 1 and 6 e R, proving the lemma. 
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Proof of Lemma 5.7: 

Since 112*11= 1. we only need to show that there exists K> 0 such that 
| b n | £ K a.s. for large n. 

By Lemma 5.1 there exist 0 < p < 1, f 0 > 0, and e > 0 such that for all 1 1 1 2 1 0 
pE\p(a!e + t)] > E\p( a’e){ + e (A.23) 

By (ix) there exist N €.\ N and 1 1 > 0 such that for all n 2 N and 1 1 1 2 t j 

2 p . (A. 24) 

71 

Let 1 2 = max 0 >* tJ- If \ e ./*(* a) and I b | > 2f 2 then 

|b - a'X( | 2 |6 | - | a'Xtl 2 |6| - ||Xt|| £ |6| - t z > t 0 . (A.25) 

Finally, for 16 | > 2f 2 

— 2 P<aX-6) = — S Elpii'Si-b)] + ^ 2 [p(a , *t-&)-£fp(» , *i-b)J] 

n i,i 7i i=1 n i =] 

2 ± 2 F(p(a 1 x i -6)j + -i- J 0(a^-6)-ff(p(a'«,-6)jJ . (A.26) 

71 <^ n (^ 71 *»» 


By Lemma 5.6, the second term of (A.26) converges to 0 a.s.. uniformly on a 
and b as n-*<*>. Regarding the first term we have 

i- 2 Elpfrxt-b)] = £ 2 £ip[a'c t + (a'Xrb)]i 

n n 


£ ~ 2 £fp(a'C t + f 0 )i = 

n *^«<‘»> 


n(< 2 ) 


71 


F^a’c + fo)! 


2 F|p(a' 0 eH + e 


(A. 27) 


for large n. 

On the other hand, by the SLLN 

- t p(«'«*i - bn) S - £ P(«'o*i -bo) = ;r £ p(a’oCi) - £|p(a’oe)} a.s. 

" 71 n 


n 


*■1 


<>l 


71 


tat 


(A. 28) 


wrnmmmmmmmmmmM 





Now the lemma follows from (A.27) and (A.28). 


Proof of Lemma 5.8: 

Since p is bounded is pointwise bounded. We will show now that is 
equicontinuous. 

Let e > 0 be fixed. Let 6 > 0 be such that 1 1 - T| < 6 implies 
\p(t)- p(t)\ < e/2. Let 0,0 £ C be such that ||a-a|| + |6 -6*| < A. for some A>0 
to be chosen later. 

By (iLa). for any K > 0 we have 

=~ £ [SjpCft^ + a'Xc-fe )}-EWltEi + a*X i -6)j] 

71 <=! 

£ EWr+*%-b)-p(r+tti-b)\ 

i = t 

S i Y, E\\pir + *%-b) -p(r + «fx t -S , )M + 

71 

+ ± £ E\\p(Y + a'Xt -6) - p(y + aXj -6)|) (A.29) 

71 W„(A-) 

where Y = a'C. By (ix), we can choose K such that 2(1 ~n{K)/n)M < e/2 for all 
n £ We can now choose A>0 such that 2/TA < e/2. For this choice of K and 
A it follows that, for all if £ N, 

|/! n (0)-/i n (3)| § f = e (A-30) 

This proves the lemma; since hn(0) is a continuous function of 0 for each fixed 
n, the cases n £ N, are immaterial to the proof. 
















s 


Proof of Lemma 5.9: 


Let Q = (a ,b) ^ 6q. By (x) there exists N = N(6) € IN. t = t(6) > 0, and 


6 = 6(6) > 0 such that for all n ^N 


m(e) _ : 1 a'Xi ~t> I S e| ^ ^ 


(A.31) 


By Lemma 5.1, there exists 5' > 0 such that E [p(Y +c)j - E [p(Y )j - 6’ > 0. 


K(0) = ~ £ Elp[Y + (*%-b)]} = 

n i=i 

= -[ £ E\p[Y + (*'\-b)}]+ 2 £'lp[r + (a*X t -6)] 


71 




s - [ s (mm+m £ mm 
71 »€/>(*) 


= hn(Oo) + <5* § M*o) + «' 


Therefore, for n 1 AT we have 


M*)-M*o) > o 


(A.32 


proving the lemma. 


> * VJ 

Mi 
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